31 research outputs found
Low-Rank Modular Reinforcement Learning via Muscle Synergy
Modular Reinforcement Learning (RL) decentralizes the control of multi-joint
robots by learning policies for each actuator. Previous work on modular RL has
proven its ability to control morphologically different agents with a shared
actuator policy. However, with the increase in the Degree of Freedom (DoF) of
robots, training a morphology-generalizable modular controller becomes
exponentially difficult. Motivated by the way the human central nervous system
controls numerous muscles, we propose a Synergy-Oriented LeARning (SOLAR)
framework that exploits the redundant nature of DoF in robot control. Actuators
are grouped into synergies by an unsupervised learning method, and a synergy
action is learned to control multiple actuators in synchrony. In this way, we
achieve a low-rank control at the synergy level. We extensively evaluate our
method on a variety of robot morphologies, and the results show its superior
efficiency and generalizability, especially on robots with a large DoF like
Humanoids++ and UNIMALs.Comment: 36th Conference on Neural Information Processing Systems (NeurIPS
2022
Self-Organized Polynomial-Time Coordination Graphs
Coordination graph is a promising approach to model agent collaboration in
multi-agent reinforcement learning. It conducts a graph-based value
factorization and induces explicit coordination among agents to complete
complicated tasks. However, one critical challenge in this paradigm is the
complexity of greedy action selection with respect to the factorized values. It
refers to the decentralized constraint optimization problem (DCOP), which and
whose constant-ratio approximation are NP-hard problems. To bypass this
systematic hardness, this paper proposes a novel method, named Self-Organized
Polynomial-time Coordination Graphs (SOP-CG), which uses structured graph
classes to guarantee the accuracy and the computational efficiency of
collaborated action selection. SOP-CG employs dynamic graph topology to ensure
sufficient value function expressiveness. The graph selection is unified into
an end-to-end learning paradigm. In experiments, we show that our approach
learns succinct and well-adapted graph topologies, induces effective
coordination, and improves performance across a variety of cooperative
multi-agent tasks
Symmetry-Aware Robot Design with Structured Subgroups
Robot design aims at learning to create robots that can be easily controlled
and perform tasks efficiently. Previous works on robot design have proven its
ability to generate robots for various tasks. However, these works searched the
robots directly from the vast design space and ignored common structures,
resulting in abnormal robots and poor performance. To tackle this problem, we
propose a Symmetry-Aware Robot Design (SARD) framework that exploits the
structure of the design space by incorporating symmetry searching into the
robot design process. Specifically, we represent symmetries with the subgroups
of the dihedral group and search for the optimal symmetry in structured
subgroups. Then robots are designed under the searched symmetry. In this way,
SARD can design efficient symmetric robots while covering the original design
space, which is theoretically analyzed. We further empirically evaluate SARD on
various tasks, and the results show its superior efficiency and
generalizability.Comment: The Fortieth International Conference on Machine Learning (ICML 2023
Context-Aware Sparse Deep Coordination Graphs
Learning sparse coordination graphs adaptive to the coordination dynamics
among agents is a long-standing problem in cooperative multi-agent learning.
This paper studies this problem and proposes a novel method using the variance
of payoff functions to construct context-aware sparse coordination topologies.
We theoretically consolidate our method by proving that the smaller the
variance of payoff functions is, the less likely action selection will change
after removing the corresponding edge. Moreover, we propose to learn action
representations to effectively reduce the influence of payoff functions'
estimation errors on graph construction. To empirically evaluate our method, we
present the Multi-Agent COordination (MACO) benchmark by collecting classic
coordination problems in the literature, increasing their difficulty, and
classifying them into different types. We carry out a case study and
experiments on the MACO and StarCraft II micromanagement benchmark to
demonstrate the dynamics of sparse graph learning, the influence of graph
sparseness, and the learning performance of our method
Deep Contract Design via Discontinuous Networks
Contract design involves a principal who establishes contractual agreements
about payments for outcomes that arise from the actions of an agent. In this
paper, we initiate the study of deep learning for the automated design of
optimal contracts. We introduce a novel representation: the Discontinuous ReLU
(DeLU) network, which models the principal's utility as a discontinuous
piecewise affine function of the design of a contract where each piece
corresponds to the agent taking a particular action. DeLU networks implicitly
learn closed-form expressions for the incentive compatibility constraints of
the agent and the utility maximization objective of the principal, and support
parallel inference on each piece through linear programming or interior-point
methods that solve for optimal contracts. We provide empirical results that
demonstrate success in approximating the principal's utility function with a
small number of training samples and scaling to find approximately optimal
contracts on problems with a large number of actions and outcomes
RODE: Learning Roles to Decompose Multi-Agent Tasks
Role-based learning holds the promise of achieving scalable multi-agent
learning by decomposing complex tasks using roles. However, it is largely
unclear how to efficiently discover such a set of roles. To solve this problem,
we propose to first decompose joint action spaces into restricted role action
spaces by clustering actions according to their effects on the environment and
other agents. Learning a role selector based on action effects makes role
discovery much easier because it forms a bi-level learning hierarchy -- the
role selector searches in a smaller role space and at a lower temporal
resolution, while role policies learn in significantly reduced primitive
action-observation spaces. We further integrate information about action
effects into the role policies to boost learning efficiency and policy
generalization. By virtue of these advances, our method (1) outperforms the
current state-of-the-art MARL algorithms on 10 of the 14 scenarios that
comprise the challenging StarCraft II micromanagement benchmark and (2)
achieves rapid transfer to new environments with three times the number of
agents. Demonstrative videos are available at
https://sites.google.com/view/rode-marl